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1 Abstract 

DNA strand displacement systems have proven themselves to be fertile substrates for the design of programmable 
molecular machinery and circuitry. Domain-level reaction enumerators provide the foundations for molecular pro¬ 
gramming languages by formalizing DNA sttand displacement mechanisms and modeling interactions at the “do¬ 
main” level - one level of abstraction above models that explicitly describe DNA strand sequences. Unfortunately, the 
most-developed models currently only treat pseudo-linear DNA structures, while many systems being experimentally 
and theoretically pursued exploit a much broader range of secondary structure configurations. Here, we describe a 
new domain-level reaction enumerator that can handle arbitrary non-pseudoknotted secondary structures and reaction 
mechanisms including association and dissociation, 3-way and 4-way branch migration, and direct as well as remote 
toehold activation. To avoid polymerization that is inherent when considering general structures, we employ a time- 
scale separation technique that holds in the limit of low concentrations. This also allows us to “condense” the detailed 
reactions by eliminating fast transients, with provable guarantees of correctness for the set of reactions and their kinet¬ 
ics. We hope that the new reaction enumerator will be used in new molecular programming languages, compilers, and 
tools for analysis and verification that treat a wider variety of mechanisms of interest to experimental and theoretical 
work. We have implemented this enumerator in Python, and it is included in the DyNAMiC Workbench Integrated 
Development Environment. 

2 Introduction 

A series of inspiring demonstrations during the l ast 15 years have shown DNA to be a robust and versatile sub¬ 
strate for nanoscale constructi on and computation^^®. DNA has been used to build tile assemblies I®; arbitrary 
shapes®; mole cular m otorswalkers L i 2 | i3 | i4 | i5| ^ robots ; analog circuits that perform amplification 
of concentrations 11^41121 and polymer lengths IMMl ■ synthetic transcriptional circuits I®; molecular logic gates I®; 
Turing-u niversal s tack machines 1^; and application-specific analog and digital circuits that perform non-trivial cal¬ 
culations P4 | 25 | 26 | l| A common abstraction is to describe these complex systems in terms of a number of “domains”— 
contiguous sequences of nucleotides that are intended to participate in hybridization as a group; complementary do¬ 
mains are intended to interact, and all other domains are not. Once a system has been described in terms of domains, a 
computational sequence design package ca n geri erate a sequence of nucleotides for each domain in the system, imple¬ 
menting the desired complementarity rules Developing such increasingly complex dynamic DNA nanosystems 

requir es too ls both for the automated design of these systems, but also for software-assisted verification of their be¬ 
havior 

One mode of verification is to check whether a set of DNA complexes can react to produce the desired product 
species (or various anticipated intermediate species). Given a domain-level model, this can be done by enumerating 
the network of all possible reactions that can occur, starting from a finite set of initial complexes. The result is a 
chemical reaction network (equivalently a Petri net), connecting these initial complexes to some set of intermediate 
complexes. For the sake of this paper, we use the term “enumeration” to refer to the process of generating a chemical 
reaction network, given a set of initial complexes and a set of rules for their possible interactions. 

This enumeration process, by itself, does not fully describe the expected kinetic behavior of the system, because 
the concentration (or number of copies) of each initial species is not yet specified. However, the enumerated network 
of reactions forms the basis of a system’s dynamics, and therefore a great deal can often be observed about a system’s 
behavior simply by examining the network of reactions. Unintended side reactions that could hamper the kinetics of the 
system or cause unanticipated failure modes can be identified. Furthermore, given initial concentrations (or counts), the 
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Figure 1: The Polymerization Problem. Reaction enumerators must avoid generation of infinitely-long polymers, (a) 
Intended behavior for a simple system of two strands. Binding between the red strand and the blue strand results 
in the formation of a stable duplex (the binding could be nucleated by binding between either domain a/a* or do¬ 
main b/b* —here we show only the former for simplicity), (b) Pathological behavior in an enumerator without a 
separation of timescales. Repeated bimolecular association reactions are allowed to occur before the intramolecular 
binding reaction, generating implausibly long polymers. Depending on the order in which the enumerator searches 
for reactions, this could prevent such an enumerator from ever finding the kinetically- and thermodynamically-favored 
complex—the simple duplex, (c) Reaction network generated by our enumerator. Separation of reactions into “fast” 
(unimolecular) and “slow” (bimolecular) allows the classification of certain complexes (dashed boxes) as “transients”; 
by prohibiting the transient complexes from participating in bimolecular reactions, the pathway generated by our 
enumerator converges on the intended duplex. 


chemical reaction network can be simulated using deterministic ordinary differential equations (ODEs) or stochastic 
methods, and in some cases global properties of the dynamics can be analyzed. In dynamic DNA nanotechnology, 
this type of enumeration, like the design, has often been performed by hand; this is clearly infeasible for all but the 
most simple systems, as the number of possible unintended interactions between two species grows quadratically in 
the number of domains. Worse, the combinatorial stmcture inherent in the rearrangement and assembly of strands can 
give rise to an exponential—potentially even infinite—number of species. Thus, automated methods for performing 
the enumeration are essential, and for tractability it is necessary that they focus attention on only the most relevant 
possible complexes and reactions. Rule-based models developed for concisely representing combinatorial structures 
in systems biology, such as BioNetGenl^ or Kappa 1^, could in principle be used for DNA systems, but appropriate 
mles for the relevant DNA interactions would have to be provided by the user for each system. 

Several previous efforts have explored the in silica testing and verification of dynamic DNA nanotechnology sys¬ 
tems using a built-in set of rules. Most notably, Phillips and Cardelli have dem onstrated the DNA circuit compiler 
and analysis tool “Visual DSD,” which was subsequently extended by Lakin et al. F9 | 33 | 34| analysis component of 
DSD allows for enumeration of possible reactions between a set of initial complexes, and then simulation of those 
reactions—either using a system of ODEs or a stochastic simulation. DSD also allows a hybrid “just-in-time” enu¬ 
meration model which interleaves enumeration with simulation; in this model, reactions are stochastically selected 
for pursuit by the enumerator based on their estimated probabilities (rather than the entire network being enumerated 
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exhaustively) . However, DSD has a limited system of representation for strand displacement systems; in particular, 
the formalism cannot represent branched junction, hairpin, or multiloop structures. Similarly, the DSD model cannot 
currently express reactions relying on looped or branched intermediates, such as 4-way branch migration or remote 
toehold-mediated 3-way branch migration®. These structural moti fs and reaction mechanisms have proven useful 
in recent experimental work for self-assembly®®, locomotion®®, imaging®, and computation. Earlier work by 
Nishikawa et al. included a similar joint enumeration and simulation model that allows arbitrary secondary structures 
using strands with “abstract bases” analogous to our “domains”; combinatorial explosion was controlled by only al¬ 
lowing interactions between complexes above some threshold concentration®. Mor e recent work by Kawamata et 
al. uses a graph-based model to represent and simulate reactions between species®®. 

Here, our goal was to develop a domain-level reaction enumerator that (a) separates the enumeration and simulation 
steps so that the resulting reaction network can be rigorously analyzed, (b) models a wide range secondary structures 
and conformational mechanisms used in dynamic DNA nanotechnology, and (c) controls combinatorial explosion 
using an approximation that is valid in a well-defined limit, thus allowing simplification of the resulting network. Our 
enumerator, which we call Peppercorn, exhaustively determines the possible hybridization reactions between a set of 
initial complexes, as well as between any complexes generated as products of these reactions, yielding a complete 
reaction network. The enumerator can represent arbitrary non-pseudoknotted complexes, and handles 3-way and 4- 
way branch migration, including reactions initiated by remote toeholds. It enforces a separation of timescales in order 
to avoid implausible polymerization that would otherwise result from this degree of generality. Finally, it includes a 
scheme for “condensing” reactions to consider only slow transitions between groups of species—excluding transient 
intermediate complexes. We have implemented the Peppercorn enumerator in Python, and it has been included as part 
of the DyNAMiC Workbench Integrated Development Environment®, which is available online with a graphical user 
interface (www.molsys.net/workbench). 

The driving issue in the development of our enumerator has been the need to model as wide a range of secondary 
structures as possible. We chose non-pseudoknotted secondary structures, a class that includes not only the linear and 
“hairy linear” structures of DSD, but also branched tree-like structures with hairpin loops, bulge loops, interior loops, 
and multiloop junctions (Fig. |S3| l. (Pseudoknotted structures, in which branches of a tree can contact each other to 
form cycles, require more complicated energy models, have geometrical constraints, and are not extensively used in 


dynamic DNA nanotechnology yet, so version 1.0 of Peppercorn does not allow them. See Appendix D and Fig. S4 
for further explanation.) 

The choice to allow arbitrary non-pseudoknotted structures, and arbitrary hybridization interactions between them, 
introduces some problems that don’t arise in restricted models such as Visual DSD. For example, some sets of species 
may allow for the generation of infinitely long polymers (Fig.[T]l; a naive attempt to enumerate all possible reactions 
in such a system would prevent the algorithm from terminating. In the laboratory, these polymers will not occur at low 
concentrations if kinetically faster reactions are possible that preclude polymerization. Therefore, the enumerator must 
provide some plausible approximation for a separation of kinetic timescales. We find here that the low-concentration 
limit, in which unimolecular reactions are sufficiently faster than bimolecular reactions, provides a clean and rigorous 
basis for a semantics that avoids the spurious polymerization issue. 

Finally, it is desirable that the results of this enumeration be interpretable to the user—an excessively complex 
reaction network will be useful only for performing kinetic simulations and will make it hard to distinguish intended 
reaction pathways from unintended side reactions. Simplification may be necessary for further verification (for in¬ 
stance, comparison of the system’s predicted behavior to a specification, given in a higher level language). It is 
essential that any simplification has some guaranteed correspondence to the full, detailed reaction network. Given 
a presumed separation of timescales, one way to make the network more interpretable is to show only the “slow” 
reactions, and to assume that “fast” reactions happen instantaneously. We will explore this idea in detail later. 

As a motivating example, we show the 3-arm junction system of Yin et al. in Fig. [ 2 ]®; this system involves the 
assembly of a 3-arm junction from a set of metastable hairpins, triggered by the addition of a catalyst (Fig.|^). The 
catalyst opens one hairpin by toehold-mediated branch migration, exposing another toehold (previously sequestered 
within the hairpin stem); this toehold similarly opens the second hairpin, which can open the third hairpin. In the final 
step, a four-strand intermediate complex (containing the three hairpins and the initiator) can collapse to the final 3-arm 
junction structure, releasing the single-strand catalyst. Even for this simple process, the full, detailed reaction network 
is rather complicated (because of the separate toehold-nucleation steps necessary for attachment of each hairpin to 
the growing structure, as well as various unintended side reactions). These steps result in transient intermediates 
that quickly decay to a more stable “resting complex.” By skipping these transient intermediates (and depicting only 
bimolecular transitions between sets of resting complexes), the reaction network can be greatly simplified (Fig.|^). 
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Figure 2; Catalytic 3-arm iunction formation. We show the catalytic 3-arm junction assembly process (originally 
demonstrated by Yin et al. as enumerated by our software; the intended pathway is highlighted in red and purple. 
Boxes with rounded edges represent complexes—initial complexes have blue backgrounds, “transient” complexes have 
dashed borders (e.g. complex 5), and “resting complexes” have yellow backgrounds. Circular nodes are reactions— 
“unimolecular” (blue) and “bimolecular” (red) reactions. “Resting sets” are shown as light yellow boxes around 
complexes (in this case, each resting set has only one complex), (a) Detailed reactions. A set of three strands (n2, 
n3, n4) begin as metastable hairpin monomers. A single-stranded initiator (nl) induces toehold attachment (5) and 
opening of the first hairpin, exposing a sequestered toehold (11). The opened hairpin-initiator complex can bind the 
second hairpin (n3) in a similar process, and so on for the third hairpin (n4). Finally, remote toehold-mediated branch 
migration allows n4 to displace the initiator nl, allowing for multiple turnover. Greyed-out reactions and complexes 
indicate non-productive binding, (b) Condensed reaction graph, showing only reactions between “resting sets.” (c) 
Key complexes are shown in detail. 


4 

















































3 Reaction Enumeration Model 

3.1 Primitives and Definitions 

We begin by defining some terms necessary for a discussion of the reaction enumeration algorithm and process. These 
terms are also demonstrated in Fig.|^ 

Definition 1. A domain d = (z, a) is a tuple where z is the name of the domain and a is the (optional) string 
of nucleotides from the alphabet S = {A, C,T, G}. The length of a domain d is written as |d| and represents 
the number of nucleotides in the sequence a. The enumerator semantics depend only on |d| and not on the actual 
sequence, thus allowing the domain-level dynamics to be specified prior to sequence design. A domain d = (z, a) has 
a complementary domain d* whose name is generally z* and whose sequence is the Watson-Crick complement of s. 

Definition 2. A strand s = (z, D) is a tuple where z is the name of the strand and Z? is a sequence of domains 
{di,d 2 , ■ ■.), ordered from the 5’ end of the strand to the 3’ end. Two strands are equal if and only if their names are 
equal and they contain identical sequences of domains. 

Definition 3. A structure T — {Ti,T 2 , ■ ■ - Tn) (for a sequence of n strands) is a sequence of lists of bindings, 
where each binding is either a pair of integers or 0. That is, Ti = {Ti^i,Ti^ 2 , • ■ •) is the structure for strand i. 
Tij = {si,j,dij) is a tuple indicating that domain number j within strand number i is bound to domain number dij in 
strand number Sij. This encoding is redundant because Tij = {i' ,j') j' = (*j j)- if ^ domain is unbound. 

Definition 4. A complex c = (z, S', T) is a tuple where z is the name, S is the sequence of strands in the complex, and 
T is the structure of the complex. 

• A complex can be rotated by circularly permuting the elements of S and T (e.g. S' = (S„, Si, S 2 ,..., S„_i)) 
and setting Si j = l + ((sij+n—1) mod n)for all Sij G TiforallTi G T). Note that for a non-pseudoknotted 
complex, the sequence of strands has a unique circular ordering in which the base pairs are properly nested, but 
there are multiple equivalent circular permutations of the complex ED Therefore we say the resulting complex 
is in canonical form if the lexicographically (by the strand name) lowest strand S^ appears first (e.g. Si = Sf)- 
A complex may be rotated repeatedly in order to achieve canonical form. Two complexes in canonical form are 
equal if they contain the same strands in the same order, and their structures are equal. 

• A complex must be “connected”, which means that all strands in the complex are connected to one another. 
Two strands Si and S 2 are “connected” if either Si is bound to S 2 or Si is bound to some other strand that is 
connected to S 2 . 

Definition 5. A reaction r = {A, i?) is a tuple where A is the multiset of reactants and B is the multiset of products, 
where reactants and products are both complexes. We write p{r) to represent the rate of r. 

• The arity a(r) of a reaction r is a pair (|A|, |i?|), where |A| counts the total number of elements in A, and 
similarly for \B\. Any reaction with arity (1, n) is unimolecular, reactions with arity (2, n) are bimolecular, and 
those with other arities are higher order. For the sake of this paper, we do not consider (0, n) or (n, 0) reactions 
(those which birth new products from no reactants or cause all reactants to disappear). 

• A reaction may be classified as fast or slow, we make this separation based on the arity of the reaction (uni¬ 

molecular reactions are fast, while bimolecular and higher-order reactions are slow). This assumption is rooted 
in the Law of Mass Action for chemical kinetics. For a unimolecular reaction A ^ B, the rate puni of the 
reaction is given by puni = fcuni [^] where [A] is the concentration of species A, and fcuni is a rate constant. For 
a bimolecular reaction A + B ^ C, the rate pbi of the reaction is given by pbi = A:bi[A][iZ], where [A] is the 

concentration of species A, \B] is the concentration of B, and fcbi is a rate constant. 

This means that, as the concentration of all species decreases, the rates of bimolecular reactions decreases more 
quickly than the rates of unimolecular reactions. Intuitively, this is because bimolecular reactions require the 
increasingly improbable collision of two species (rather than one species simply achieving sufficient transition 
energy). Therefore, the assumption of a “clean” separation of timescales (such that all unimolecular reactions 
occur at a faster rate than any bimolecular reactions) is only valid at sufficiently low concentrations. 

• For a set R of reactions, we sometimes write Rf to represent the fast reactions and Rg to represent the slow 

reactions, such that R = Rf U Rg. Finally, it will be sometimes useful to partition Rf into (1,1) and (l,n) 

reactions, such that Rf = Rj’^'^ U where by convention n > 1. 
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Definition 6. A resting set Q is a set of one or more complexes—strongly-connected by fast (1,1) reactions—with 
no outgoing fast reactions of any arity. That is, fast reactions can interconvert between any of the complexes within 
Q, but no fast reaction is possible that transforms a complex in Q to any complex outside of Q. We write this as: 
Vr = {A, B) G Rf. A C Q B C Q. 

• Complexes which belong to resting sets are called resting complexes', all other complexes are transient com¬ 
plexes. 

Definition 7. A reaction network is a pair G = (C, R) where C is a set of species (either complexes or resting sets) 
and i? is a set of reactions between those species. 

• When C is a set of complexes, we refer to G as a “detailed reaction network.” In Sec. |^we discuss reaction 
networks where G is a set of resting sets; these are “condensed reaction networks.” 

• Although we may informally refer to a reaction network as a “reaction graph,” the presence of bimolecular reac¬ 
tions means that most reaction networks cannot be simply represented as a graph whose vertices are complexes 
and whose reactions are edges. 

If R contains only (1,1) reactions, than we can consider G to be a directed graph, where G are the vertices, 
and each reaction r = ({a} , {h}) G i? is an edge connecting the reactant complex a to the product complex b. 
We use this representation to dehne a notion of resting sets (above), and to provide a scheme for condensing a 
reaction network. 

Alternatively, a reaction network may be represented as bipartite graph G' = (V, E), where the set of vertices 
V = C U R, and the edges in E are pairs (c, r) or (r, c) for some r G R, c G C. For instance, this would allow 
a reaction r such as: 

X + Y -)■ Z 

to be represented by the graph G = ({-A, Y, Z} U {r} , {(A", r), {Y, r), (r, Z)}). We use this representation only 
when visually depicting reaction networks, as in Fig. In all other cases in this paper, the graphs we refer to 
will be unipartite where the vertices are species. 


Definition 8. A “reaction enumerator” is therefore a function which maps a set of complexes C to a set of reactions 
TZ between those complexes. Therefore application of a reaction enumerator to a set of complexes yields a reaction 
network. 

We note briefly that these notions are related to other models of parallel computing, particularly Petri nets F2|43| 
Our reaction networks could also be considered Petri nets, where the species (either complexes—for a detailed reaction 
network, or resting sets—for a condensed reaction network) are “places” and the reactions are “transitions.” In a test 
tube, there would be a finite number of molecules for each individual species—in the language of Petri nets, molecules 
are called “tokens”, and a “marking” is an assignment of token counts (molecule numbers) to each place (species). 

We emphasize that the “enumeration” task at hand is determination of the CRN/net itself—we are not enumerating 
possible molecule counts/markings of the net. Rather, we assume a finite initial set of complexes and—given some 
rules about how those complexes may interact—attempt to list all possible reactions that may occur. In the next 
section, we describe these interaction rules. 

3.2 Reaction types 

The enumerator relies on the definition of several “move functions”—a move function takes a complex or set of 
complexes and generates all possible reactions of a given type. In principle, many reaction types are possible; however, 
we restrict ourselves to the following four basic types, further classifying each type by arity. Here we provide the move 
functions for each of these reaction types. The different reaction types are summarized in Fig.[^ 

Note: when clear from context which complex is under consideration, we will sometimes write dij to refer to 
domain dj on strand Si in this complex. Similarly, if X = {i,j), we may write dx to refer to dij. 

Binding - two complementary, unpaired domains hybridize to form a new complex (Fig.[^). 


(1,1) binding - binding occurs between two domains in the same complex. These reactions can be dis¬ 
covered by traversing each domain on each strand and looking ahead for higher-indexed domains that 
are complementary (within a multiloop, skipping over stems that lead to branched structures, in order to 
avoid generating pseudoknots); a separate reaction is generated for each separate complementary pair. See 
Alg. SI for pseudocode. 
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• (2,1) binding - binding occurs between two domains on different complexes. These reactions can be found 
by comparing each domain in one complex c to each domain in another complex c', generating a separate 
reaction to join each complementary pair that is found. See Alg.|^in the apppendix for pseudocode. 


Opening - two paired domains in a complex detach (Fig.[^). We assume that there is some threshold length L, such 
that duplexes less than or equal to L nucleotides long can detach rapidly (in unimolecular fast reactions), while 
domains longer than L bind irreversibly. L is a parameter that can be set by the user; the default value is L = 8. 


These reactions can be found by examining each domain dj on a complex that is bound to a higher-indexed 
domain, looking to the left and right of that domain (along the same strand) to determine the total length £ in 
nucleotides of the helix containing dj. If £ < L, a new reaction is generated that detaches all bound domains in 
the helix. See Alg. S3 in the appendix for pseudocode. 


• (1,1) opening - two paired domains in a complex detach, but the complex remains connected 

• (1,2) opening - two paired domains in a complex detach, and the complex dissociates into two complexes 


3-way - an unpaired domain replaces one domain in a nearby duplex (branch migration) (Fig.[^). These reactions can 
be discovered as follows: for each bound domain with an adjacent unbound domain on each strand (the invading 
domain), look first to the left and then to the right of that domain—skipping over internal loops—for a third 
bound domain (the displaced domain) that is complementary to the invading domain. If the displaced domain 
is directly adjacent to the bound domain (on the same strand), then this is “direct toehold-mediated branch 
migration”; otherwise (e.g. if the bound domain and the disp laced domain are separated by an internal loop), 
this is known as “remote toehold-mediated branch migration.” l25|^See Alg. S4 in the appenix for pseudocode. 


• (1,1) 3-way - the branch migration results in a complex which remains connected 

• (1,2) 3-way - the branch migration releases a complex 


4-way - a four-arm junction is re-arranged such that hybridization is 
See Alg. S5 in the appendix for pseudocode. 


exchanged between several strands (Fig. [^). 


• (1,1) 4-way - the branch migration results in a complex which remains connected 

• (1,2) 4-way - the branch migration releases a complex 


3.3 Separation of timescales 

Reactions are classified as fast or slow according to their arity; as discussed above (l,n) reactions are fast for all 
n > 0, while all other reactions are slow. Complexes are partitioned into “transient complexes” and “resting com¬ 
plexes.” Resting complexes are those complexes that are part of a “resting set”, while transient complexes are all 
other complexes, resting sets are calculated as follows: consider a reaction network C, i?; we can construct a graph 
G = (C, where is the set of fast (1,1) reactions in the reaction network. We compute the set W of 

“strongly-connected components” of G; Each component Wi € W is a resting set if and only if there are no outgoing 
fast reactions for any complex in Wi. That is, for some resting set Q G W, any fast reaction consuming a complex in 
Q produces only products that are in Q. 

Intuitively, resting sets are sets of complexes that can interconvert rapidly between one another, whereas transient 
complexes are those which (via only fast reactions) rapidly react to become different complexes (eventually resting 
complexes). Given infinite time, all molecules that begin in transient complexes will eventually reach a resting set, 
and all complexes within a resting set will be visited infinitely often by any molecule that remains in the resting set. 

Our assumption of a separation of timescales allows the following simplification to be made—transient complexes 
are unlikely to undergo slower bimolecular (or higher-order) reactions, since fast reactions rapidly transform them to 
other complexes. We therefore assume that only resting complexes can participate in slow reactions. This simplifi¬ 
cation avoids consideration of a large number of highly unlikely reaction pathways and intermediate complexes. For 
instance, it significantly reduces the problem of potentially infinite polymerization, as shown in Fig.[^. Similarly, con¬ 
sider the example of the three-arm junction (Fig.|^. Before the junction closes, displacing the initiator, it is possible 
for another hairpin to bind the exposed third-arm, catalyzing the formation of a fourth arm; this process could repeat, 
creating an implausibly large polymeric structure. We recognize that, in the limit of low concentrations, the bimolecu¬ 
lar association will be much slower than the unimolecular branch migration. By classifying the intermediate complex 
as transient (to be quickly resolved by the formation of a completed 3-arm junction), we prohibit the bimolecular 

* An example can be seen in Fig.j^, in the reaction 36—>41-|-nl 
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Figure 3; Available reaction types, (a) Bind (1,1) and (2,1). (b) Open (1,1) Md (1,2). (c) 3-way branch migration (1,1) 
and (1,2). (d) 4-way branch migration (1,1) and (1,2) 


association reaction that could result in the polymer. Additionally, since the enumeration of unimolecular reactions is 
linear in the number of species, while enumeration of bimolecular reactions is quadratic, eliminating the consideration 
of bimolecular reactions between transient complexes effectively reduces the complexity of the enumeration problem. 

It should be noted that the set of reactions presented in our model still does not describe all possible behaviors 
of DNA strands—notably, “partial” binding between domains is not possible, and none of the reactions produces 
complexes that are pseudoknotted. Additional details are provided in the appendix. 

4 Reaction Enumeration Algorithm 

The essence of the reaction enumeration algorithm is as follows; 


1. Exhaustively enumerate complexes reachable from the initial complexes by fast reactions. 

2. Classify those complexes into transient complexes or resting complexes (by calculating the SCCs in terms of 
fast reactions and marking those SCCs with no outgoing fast reactions resting sets). 

3. Enumerate slow (bimolecular) reactions between resting complexes. 

4. Repeat this process for any new complexes generated by slow (bimolecular) reactions. 

Additional details of the algorithm are provided in Sec.[A| 

5 Condensing reactions 

It may not always be desirable or efficient to view—or even to simulate—the entire reaction space, including all tran¬ 
sient complexes and individual binding/unbinding events. Eor this reason, we provide a mechanism of systematically 
summarizing, or “condensing” the full network of complexes and reactions. In this condensed representation, we 
represent “transitions” between resting sets as a set of “condensed reactions,” where the reactants and products of 
condensed reactions are resting sets, rather than individual complexes. Recall that a resting set is a set of one or more 
complexes, connected by fast reactions (and with no outgoing fast reactions). All of these condensed reactions will 
be bimolecular. Intuitively, if we assume that fast reactions are arbitrarily fast, then looking only at the resting sets, 
the condensed reactions will behave exactly the same as the full reaction network. We formalize the correspondance 
between trajectories in the detailed and the condensed reaction networks in Sec.|B| 
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We define the “condensed reaction network” to be the reaction network G = (Q, TZ), where Q is the set of resting 
sets and TZ is the set of condensed reactions. For clarity, we will refer to the “full”, non-condensed reaction network 
G = {C,TZ) as the “detailed reaction network”. The goal of this section, therefore, is to describe how to calculate 
the condensed reaction network G from the detailed reaction network G. We later prove some properties of this 
transformation—namely that transitions are preserved between the two reaction networks in Sec.[Bl 

Since much of the following discussion will involve both sets (which may contain only one instance of each 
member), and multisets (which may contain many instances of any element—for example, the reactants and products 
of a chemical reaction are each multisets), we will use blackboard-bold braces {| [|- to represent multisets and normal 
braces { } to represent sets. Let us also define a useful operation on sets of multisets. Let A and B be sets of multisets; 
then we will define the “Cartesian sum” of A and B tohe A (B B = {a + b : a £ A,b £ B}. ^ It is easily shown 
that the Cartesian sum is associative and commutative. We will therefore also sometimes write gs represent 
6 i © 62 © ■ • • for all bi £ B. 

We will attempt to present this section as a self-contained theory about condensed reaction networks that will be 
independent of the particular details of the detailed reaction enumeration algorithm. However, we will make certain 
restrictions on the detailed reaction network to which this algorithm is applied. The reader can verify that the reaction 
enumeration algorithm presented above satisfies these properties, even when the enumeration terminates prematurely 
(as in Sec. |A.l| ): 

1. All fast reactions are unimolecular, and all slow reactions are bimolecular or higher order. 

2. Reactions can have any arity (n, m), as long as 0 < n < 2 and m > 0. 

3. For any sequence of unimolecular reactions, where each reaction consumes a product of the previous reaction 
and the last reaction produces the original species, the sequence must consist only of 1-1 reactions^ 

4. Reactants of non-unimolecular reactions must be resting complexes. 

Fig. [^provides an example of the reaction condensation process. Before we explain this process, let us clarify a 
few properties of the detailed reaction network, based on our definitions. Considering the detailed reaction network 
G = (C, 72./ U 72s), let 72be the fast (1,1) reactions in 72. First note that the complexes in C and the fast reactions 

72y^’^^ form a directed graph F = (C,72^^’^^). We will be calculating the strongly-connected components (SCCs) 
of this graph using Tarjan’s algorithm, as we did in the reaction enumeration algorithm. Note that every complex 
(whether a resting complex or a transient complex) is a member of some SCC of F. Let us denote by S(a;) the SCC 
of F containing some complex x. The resting sets of G are the SCCs for which there are no outgoing fast reactions 
of any arity. There may be (even large) SCCs that only contain transient complexes, because there is at least one 
outgoing fast reaction from the SCC; for instance, in Fig.|^ {1,2} is an SCC, but is not a resting set (because of the 
fast reaction 2 —> 3). Also note that many SCCs may contain only one element (for instance, {3} is an SCC). Finally, 
observe that we can form another graph, F', where the nodes are SCCs of F, and there is a directed edge between 
SCCs if there is a (1, m) reaction with a reactant in one SCC and a product in the other. That is, F' = {S, E) where 
E — {S{a), S{bi)Wbi £ B :r = ({|a[}-,i3) £ Ro A a{r) = (l,m)}. F' is a directed acyclic graph (since all cycles 
were captured by the SCCs) The leaves of F' (those vertices with no outgoing edges) are the resting sets of G. 

Definition 9. A “fate” F of a complex a: is a multiset of possible resting sets, reachable from x by fast reactions. 

F is a multiset because, for instance, x may be a dimer that can decompose into two identical complexes, both of 
which are resting complexes, such that; x ^ y + y. Any complex x may have many such fate^ and all complexes 
must have at least one fate. We will denote the set of such fates by F(a;). Computing F(a;) for all complexes x in 
the ensemble therefore allows each complex to be mapped to a set of possible resting set fates. More generally, we 
may think about the fate of a multiset of complexes; this answers the question “to what combinations of resting sets 
can some molecules evolve, starting in this multiset of complexes, exclusively via fast reactions?” Let us define F(X) 
where X = {|a;i, 12, ... to be: 

F(A:) = ^ F(a:) = F(a;i) © F(a:2) © .. • 

xGX 

^This operation is equivalent to taking the Cartesian product A X B = {(a, b) : a £ A, b £ B}, then summing each of the individual pairs and 
giving a set containing all the sums. That is, A © iJ = |X]a;gp x : p £ A X b'^. The result is therefore a set of multisets 

^Consider a pathological reaction network such as a—>fe + c, c-^-a (such a reaction network would not be generated by our enumerator, 
because the number of DNA strands are conserved across reactions; this network would also not satisfy property 3). These types of reactions prevent 
us from finding meaningful SCCs. 

'^It is important to note that F' is not suitably described as a quotient graph on a transitive closure of fast (1,1) reactions, since (1, m) reactions 
can also yield edges. 

^Consider, for instance, the complex 17 in Fig.[^ in this case, there are two possible reactions resulting in different combinations of resting sets; 
therefore 17 has two possible fates—either {| {12} , {13} } or {| {21} , {20, 26} } 
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The intuition is that, since fates for different complexes are independent, the set of possible fates of X is the set of 
all possible combinations of the fates of Xi, x^, etc. From here, we can define the related notion for some reaction 
r = {A, B), where B = {|6i, & 2 , • ■ • bn^- Let IR(r) = ¥{B), such that: 

M(r) = ¥{B) = 0 F(6) = F(&i) © F(& 2 ) © ... © F(6„). 

beB 

Finally, let Ro{S) be the set of reactions leaving some SCC S of F. 

With these definition, we can provide an expression for F(a;) in terms of a recursion: 

X /{S(a;)} if S(a;)is a resting set 

\Ur.Gfi„(S(Jt))'^(^) otherwise 

The “base case” is that, if x is a resting complex, then F(x) is just x’s resting set. The recursive case can be evaluated 
in finite time, because F' is a directed acyclic graph. That is, if we start with some arbitrary transient complex x, 
the recursion can be evaluated by a depth-first traversal of F', starting from x; since F' is acyclic, each branch of the 
depth-first traversal will terminate at a leaf of F'—a resting complex for which F(x) is trivial. 

Once we have computed F(x), we can easily calculate the set of condensed reactions. Since F(x) captures all 
of the information about the fast reactions in which x participates, we must only consider slow reactions. For each 
slow reaction s = {A, B) in the detailed reaction network, where A is the multiset of reactant complexes and B is 
the multiset of product complexes, we will generate some number of condensed reactions—one for each fate of B. 
Specifically, the condensed reaction network G = (C, R) has C being the set of resting sets; we build R as follows: 
for each slow reaction s = {A,B) G R, with S(Gl) = {|S(ai) : G A[}., then for each F G F[s), we add a 

condensed reaction {S{A),F) to R. In Alg. |S7| we describe precisely how to calculate F(x), and then how to compute 
the condensed reactions from F(x). In Sec.|B|we present a set of theorems justifying the choice of this algorithm. 

6 Approximate Kinetics 

Thus far we have only discussed the kinetics of the system in the context of our assumption that the timescales for 
“fast” and “slow” reactions can be separated. We have not addressed the fact that, even within the sets of fast and 
slow reactions, different reactions may occur at very different rates. Different rates of different reactions can greatly 
impact the dynamics of the system’s behavior. Therefore in order to make useful predictions about the experimental 
kinetics of the reaction networks we are studying, we need to consider the rates of these reactions, in addition to 
simply enumerating the reactions themselves. 

In this section, we present a method for approximating the rate laws governing the rates of each of the “move types” 
discussed in Sec. |3.2[ along with an extension for calculating the rates of condensed reactions. Our implementation 
automatically calculates these rates and includes them in its output, allowing the user to import the resulting model into 
a kinetic simulator such as COPASI, provide initial concentrations for species, and simulate the resulting dynamics. It 
is important to emphasize that the rate laws that we provide (in particular, the formulas we give for the rate constants), 
although based on experimental evidence and intuition, are heuristic and approximate. Also note that the kinetics of a 
real physical system will be affected by parameters outside the consideration of our model—for instance, the sequences 
of each domain (and therefore the binding energies) will be different, and the temperature and salt concentrations can 
both change the energetics of the interactions. These changes can greatly affect the real system’s overall dynamic 
behavior. The formulas we give here are intended to roughly approximate the rate constants at 25 °C and 10 mM 
Mg2+. 

Let p{r) be the rate of some reaction r = {A, B), where A — oi, 02 ,.... For the sake of this section, we will 
assume that p{r) is given by: 

p{r) = fc [a] 

a&A 

where [a] represents the concentration of some species a, and k a constant, which we call the “rate constant”. Implicit 
in this choice of rate law is the assumption that all reactions are elementary (meaning there is only a single transition 
state between the reactants and the products); in reality, this is not the case, as most DNA strand interactions have 
a complex transition state landscape and involve many intermediate states. Since the enumerator has no knowledge 
of the concentrations of any species, the problem of estimating the rate p{r) of the reaction boils down to estimating 
the rate constant k. We will therefore attempt to provide formulas for approximating these rate constants, such that 
the overall reaction rates are consistent with experimental evidence in a reasonable concentration regime (sufficiently 
below micromolar concentrations). 

In Sec.|^ we provide approximate formulas for the rate constant k, for each of the reaction types described above. 


10 



I transient complex # bimolecular (slow) reaction 





Figure 4; Reaction Condensation Example. In this example, we attempt to illustrate possible complexities in con¬ 
densing reactions, (a) Detailed reaction network. Beginning with complex gate, rapid interconversion with 2 is 
possible. Depending on the state of the complex, interaction with t23 yields a variety of breakdown products, (b) 
Condensed reaction network. The condensed reactions show two possible bimolecular reactions between gate/2 and 
t2 3, and that each of these reactions has multiple breakdown products. Although there are two parallel pathways 
that yield complexes 12 and 13 in the detailed reaction network (via 6 or via 5 and 17), only a single reaction 
(gate —13 + 12) appears in the condensed reactions, because the two transient pathways have indistinguishable 
fates, (c) Key complexes shown in detail. 
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6.1 Approximate condensed reaction kinetics 

Consider some reaction f = (A, B) where A = {|Ai, A 2 ,... |} and B are multisets of resting sets. Assume f is part 
of a condensed reaction network {C,Tt), and that there is a detailed reaction network {C,Tl). Let be the set of all 
detailed bimolecular reactions in TZ between representations of A: 

= |r = {A', B') : r € 7^, A' is a representation of a| . 

For bimolecular reactions (where A = {Ai , A 2 )), then R^ is given by: 

Ra = = ({|Ai, A2[|', B') : r G TZ, A'l G Ai, A2 € A2I. 

The rate law p{r) for the condensed reaction f will be given by: 

p{f) = fc [A,] 

Ai^A 


where [Ai] represents the concentration of the resting set A^, which we can assume to be in equilibrium relative to the 
fast reactions. We can consider [Ai] to be the sum over the concentrations of the species in Ap. 


= E H- 

a^Ai 


Again, the real challenge will be predicting the rate constant k. 

For simplicity, we will consider only the bimolecular case, where A = (^ 1 ,^ 2 ) (the generalization to higher order 
reactions is simple). The approximate rate constant k is given by 


E 




■ kr -V 


( 2 ) 


where: P 



represents the probability that, at any given time, a single complex in the resting set Ai will be 


Oi, P 


0,2 '■ A2 


represents the probability that, at any given time, a single complex in the resting set A 2 will be 02 , kr 


represents the rate constant for the detailed reaction r, as calculated in the previous section, and P [Tg_j.^] represents 
the probability that the complexes in B decay to complexes that represent 13. Note that, if r produces products which 
can never be converted to the resting sets in B, then this term will be 0. 

The sum is over all reactions which can represent the reactants—that is, we need to consider all possible reactions 
that can consume one complex from each resting set in A. The overall rate for the condensed reaction will be propor¬ 
tional to the rates of those detailed reactions, weighted by the joint probability that those reactants are actually present, 
and that the products decay to the correct resting set. 


To calculate P |^ai : AiJ, P |^a 2 : A 2 ], and P , it is helpful to think of each of the SCCs of the the graph 

F of the detailed reaction network as individual, irreducible Continuous Time Markov Chains (CTMCs). We will use 


the results for resting sets to calculate ! 


tti : Ai 


CL2 '■ A 2 


, and results for SCCs of transient complexes to derive 


^ [^b^b\ ■ 

Consider first the resting sets. We can treat a single instance of the resting set A = {oi, 02 ,... oi,} to be a Markov 
process, periodically transitioning between each of the L states, according to the reactions connecting the various 
complexes oi, 02 ,... in the resting set. We can describe the dynamics of this process by a matrix T G where 

the elements Rj of the matrix represent the rate (possibly zero) of the reaction from state j to state i, which we denote 
p{j A’ diagonal element is the negative sum of the column: 


T- ■ — 


P{j i) 


T 


i = j 


(3) 
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Let s{t) = (si, S 2 ,.. .)^ be an L-dimensional vector giving the probabilities, at time t, of being in any of the L states. 
The continuous-time dynamics of this process obeys 


ds 

dt 


Ts(f). 


For a resting set, we assume that the system has reached equilibrium, and so s is not changing with time. We therefore 
hnd the “stationary distribution” s of this process by setting ds/dt = 0, and recognizing that s is the right-eigenvector 
of T with eigenvalue zero. Given the stationary distribution s = (si, S 2 , • ■ ■, we recognize that 


P[ai:Ai]=Si. (4) 

Next consider the transient SCCs. To hgure the probability P that complexes in B decay to complexes 

that represent B, we will again model the SCC as a Markov process, but we cannot use the stationary distribution 
since the transient SCC may be far from equilibrium. Note that this SCC does not represent a resting set, so there 
are additionally some e outgoing fast reactions that exit the SCC. We will model this SCC, including the outgoing 
reactions, as an (L-l-e)-state Markov process, where each of the e states is absorbing. We want to know the probability 
that, having entered the SCC in some state i G {1... L}, it will leave via some reaction j G {L + 1... L + e}. We 
number the reactions in this way to allow them to be discussed consistently as states in the same Markov process as 
the complexes. 

Assume the SCC is again given by A = {oi, 02 , ... a/,}. Let Q G be the matrix of transition probabilities 

within the SCC, such that Qij is the probability that, at a given time the system’s next transition is from state i to state 
j, where i,jG{l... L}. Let us therefore dehne an additional matrix E G where represents the probability 

that the system in state i G {1... L} transitions next to absorbing state j G {L + 1... Lg}. We dehne the transition 
probabilities Qy and in terms of the transition rates—if is the rate constant of the transition from state i to 
state J, then 


and similarly for Rij. To calculate the probability of exiting via state j after entering through state i, we hrst dehne 
the “fundamental matrix”—which gives the expected number of visits to state j, starting from state i: 


N = ^Q'= = (Il-Q)-i 

fc =0 


(6) 


where 1^ is the L x L identity matrix. Let us dehne the “absorption matrix” B = NR; the entries Bij represent the 
probability of exiting via state j after entering through state i. 

But what we really need is to calculate the relative probabilities of each of the fates of some complex x. Let 
P [x — E] be the probability that x decays into a given fate F. 

{ 1 if S(x) is a resting set and F(x) = {E} 

J2j Bij¥ [xj E] if S(x) is a transient SCC (7) 

0 ifE^F(x) 


where B = [Bij] is the absorption matrix for S(x), i represents the index of complex x in S(x), and j is the index of 
the reaction exits S(x), and P [xj -G F] is the probability that the products of the reaction rj decay to complexes that 
represent E. More formally, for some reaction rj = (C, D), where D — c? 2 , ...[!•; 

I] P[di ^E(]P[d2 ^E']...P[ci„^E;,] (8) 

{F^+F' + ...F'=F} 

The sum is taken over all of the ways that the fates {|E^.|} G {|F((ii) x F(d 2 ) x ... : dk G D'^ can combine such 
that the {|E^[}- sum to our target fate E (that is, Within the sum, we want to calculate the joint probabiliw 

that all of the complexes dk G D decay to their respective fates E^. These terms can be computed recursively by Eq.lA 
Finally, we can write an expression for our quantity of interest. We want to know the probability P 
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however, we realize now that this can be computed easily using Eq. 


p [r 


B^B\ 


r ^ B 


where r is the original, detailed bimolecular reaction. 

Now we have shown how to compute all the terms necessary to evaluate Eq. The structure of our arguments 
has mirrored the algorithm for deriving the condensed reactions—it is simple to efficiently evaluate Eq. and 
alongside Alg. |S7| and therefore to compute a rate constant k for each condensed reaction. 

7 Discussion 

We have presented an algorithm for exhaustive enumeration of hybridization reactions between a set of DNA species. 

Our enumerator is more flexible than previously-presented work, allowing enumeration of essentially all non-pseudoknotted 
structures; previous enumerators such as Visual DSD have been limited to structures without hairpins or internal 
loops®. The imposed separation of timescales allows for general bimolecular reactions to be enumerated without 
producing a large number of physically implausible reactions and forming a (potentially infinite) polymeric struc¬ 
ture. By instead insisting that (at sufficiently low concentrations) all unimolecular reactions will precede bimolecular 
reactions, we allow unimolecular reactions (for instance, ring-closing or branch migration) to terminate such polymer¬ 
ization reactions. The transient intermediate complex, that could otherwise participate in polymerization, is excluded 
from consideration of bimolecular reactions. 

Eurther, we have demonstrated a convenient representation for condensing large reaction networks into more com¬ 
pact and interpretable reaction networks. We have proven that this transformation preserves the relevant properties of 
the detailed reaction network—namely, that all transitions between resting sets are possible in the condensed reaction 
network—and that the condensed reaction network does not introduce spurious transitions not possible in the detailed 
reaction network. 

Our implementation exhaustively enumerates the full reaction network (or a truncated version of the network if 
certain trigger conditions are reached). Simulation can then be performed on the full reaction network to determine 
steady-state or time-course behavior. Visual DSD, includes a “just-in-time” enumeration mode, which combines the 
enumeration and simulation processes. This algorithm begins with a multiset of initial complexes, and at each step 
generates a set of possible reactions among those complexes; these possible reactions are selected from probabilis¬ 
tically to generate a new state. In this way, the model produces statistically correct samples from the continuous 
time Markov chain that represents the time-evolution of the ensemble. This algorithm could be an interesting future 
extension to our implementation. 
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Appendices 


A Reaction enumeration algorithm details 

The reaction enumeration algorithm works by passing complexes through a progression of several mutable sets; as new 
complexes are enumerated, they are added to one of these sets, then eventually removed and transferred to a later set. 
All complexes accumulate in either T (transient complexes) or S (resting state complexes). This progression enforces 
the requirement that each complex be classified as a resting state complex or transient complex, that all fast reactions 
are enumerated before slow reactions, and that complexes are not enumerated more than once. For simplicity, we 
assume an operation Pop(S') exists that removes and returns some element from the mutable set S. 


• B contains complexes that have had no reactions enumerated yet. Complexes are moved out of B into T when 
their “neighborhood” is considered. We define a “neighborhood” about a complex c to be the set of complexes 
that can be produced by a series of zero or more fast reactions starting from c. 

• contains complexes in the current neighborhood which have not yet had fast reactions enumerated. These 
complexes will be moved to JV once their fast reactions have been enumerated. 

• Af contains complexes enumerated within the current neighborhood, but that have not yet been characterized as 
transient or resting states. Each of these complexes is classified, then moved into S or T. 

• S contains resting state complexes which have not yet had bimolecular reactions with set £ enumerated yet. All 
self-interactions for these complexes have been enumerated. 

• £ contains enumerated resting state complexes. Only cross-reactions with other end states need to be considered 
for these complexes. These complexes will remain in this list throughout function execution. 

• T contains transient complexes which have had their fast reactions enumerated. These complexes will remain 
in this list throughout function execution. 


Fig. SI summarizes the progression of complexes through these sets. Additionally, two other sets are accumulated 
over the course of the enumeration: 


• TZ contains all reactions that have been enumerated. 

• Q contains all resting states that have been enumerated. 


B 

T 

^ A! 


5 

(no slow 

^ £ 

(resting 

(no reactions 
enumerated) 

(no fast 
reactions 
enumerated) 

(not classified 
into 

transient/resting) 

\ 

reactions 

enumerated) 

r 

(transient 

complexes) 

state 

complexes) 


Figure SI: Passage of complexes through various lists; complexes begin with no outgoing reactions enumerated (B), 
then have fast reactions in their neighborhood enumerated (T, Af) before being classified into transient (T) or resting 
state (S) complexes; finally slow reactions are enumerated (£). 

In Alg.j^we describe the reaction enumeration algorithm in detail. 

A.l Terminating conditions 

Although our enumerator is designed to avoid enumerating implausible polymerization reactions, as descri bed in 
Fig. 0 it is possible to enumerate systems which result in genuine polymerization, such as those described by 
To allow such enumerations to terminate, our enumerator places a soft limit on the maximum number of complexes and 
the maximum number of reactions that can be enumerated before the enumerator will exit. These limits are checked 
before the neighborhood of fast reactions is enumerated, for each complex in B. The limits are configurable by the 
user. 
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If the number of complexes in f U 5 U T is greater than the maximum number of complexes, or the number of 
reactions in TZ is greater than the configured maximum, the partially-enumerated network is “cleaned up” by deleting 
all reactions in TZ that produce complex(es) leftover in B. That is, no complex will be reported in the output that has 
not had its neighborhood of fast reactions enumerated exhaustively. Evaluating the limits only between consideration 
of each neighborhood (rather than during) prevents the pathological mis-classification of resting state and transient 
complexes, but means the limit may be exceeded if the last neighborhood considered is large. 

B Justification of the condensed reaction algorithm 

We will now justify the algorithm for condensing reactions with several theorems that show the relationship between 
the condensed reaction network G = (C,TZ) and the detailed reaction network G = C,TZ. To do this, we introduce 
two definitions. 

First, we need a notion of what kind of processes from the detailed reaction network are actually included in the 
condensed reaction network. We define a “fast transition” to be a sequence of (zero or more) unimolecular 

reactions that begin from a single initial (transient or resting state) complex x and result in a multiset B of resting state 
complexes. A “resting state transition” T^ai,a 2 ^^B is a sequence of detailed reactions starting with a bimolecular 
(slow) reaction (by definition between two resting state complexes oi and 02), followed by a sequence of (zero or 
more) unimolecular reactions that can occur if the system starts with just oi and 02 present, and such that the final 
state B consists exclusively of resting state complexes. 

Second, we need a notion of correspondence between some reaction in the condensed reaction network and a 
transition that can occur in the detailed reaction network. For some multiset of resting states A = {|Ai, A 2 , .•.[!■, 

where each Ai = ■ • ■}, a “representation” of A is a set containing a choice of one complex Oij from each 

Ai. Note that if any of the sets € A are not singletons, then there are multiple representations of A. For example, 
if A = {|Ai, A2[|-, Ai = {«!,1,01^2}, and A2 = {02.1,02.2}, then there are four possible representations of A; 
joi 1,02 1}, {oi 2, 02 1}, {oi 1,02 2}, or joi 2,02 2}- We can write A ~ A to indicate that A is a representation of 

M ’ 

Lemma 1. For every complex x, and for every fate F in the set of fates F(a;), and for every B such that B F, there 
exists a fast transition 

Proof Consider a single fate F S F(x). In the base case where x is a resting complex, then F(x) = {S(x)} is 
singleton, and we take F = S(x). If S(x) is non-singleton, then any transition between x and another complex in 
b G S(x) will satisfy the property that i? = {|6|} ^ F. If S(x) is singleton (S(x) = (xj), then the transition is 

degenerate but still satisfies the propery that B = {x} ^ F. 

When X is not a resting state complex, recognize that each fate F G F(x) was generated by application of the 
recursive case of Eq. in which a union is taken over outgoing reactions from S(x). That is, each fate F G F(x) is 
generated by some outgoing reaction r = (a, jf) from S(x). Specifically, F is one element of the set K(r) = F(/3) = 
For B ^ F, the fast transition T^^^^b can thus be accomplished by first following r, followed by 
the concatenation of for each h G fi. 

By induction, we recognize that, for any complex x and fate F G F(x), a detailed fast transition can be accom¬ 
plished from xtoB^F. □ 

Theorem 1. (Condensed reactions map to detailed reactions) For every condensed reaction f = (A, B), for every A 
that represents A, and for every B that represents B, there exists a detailed resting state transition Ta^b- 

Proof First, recognize that every condensed reaction f = (A, B) was generated by some bimolecular reaction r = 
(A, A'), where A contains only resting state complexes and represents A. Therefore we must only show that there 
exists a fast transition Tai^b, such that B ^ 13. We recognize that the multiset of products B, of the condensed 
reaction f, was generated from one element of IR(r) = F(A'). Therefore, B is an element of F(A'). By Femmaj^ 
there exists a detailed transition Ta'^b, such that B ^ B. Therefore there exists a transition Ta^b such that A ~ A 
and B ^ B. □ 

Lemma 2. For some complex x, each fast transition such that B contains only resting state complexes, 

corresponds to exactly one fate F G F(x). Specifically, there exists some fate F G F(x) such that B F. 

®Note that ~ itself is not an equivalence relation, since the left-hand side (multisets of complexes) and the right-hand side (multisets of resting 
states) are not members of the same set and therefore neither symmetry not reflexivity hold. It could be said that each of the resting states form an 
equivalence class, and the set Q of resting states is the quotient space of this equivalence class. However, the DAG F' is not simply the quotient 
graph of r (the graph between complexes, connected by (1,1) reactions) under this equivalence relation, because (1,2) reactions are not represented 
in r, yet must still generate possible fates in F'. 
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Proof. Consider the base case where a; is a resting state complex; in this case, all fast transitions from x must lead to 
another resting state complex in S(a;). F(a:) = {S(a;)}, by Eq.[^ and therefore this transition corresponds to the fate 
F = S(a;). 

Consider some detailed fast transition such that i? = {| 6 i, 62 , ■ ■ • [I- contains only resting state complexes. 

We recognize that, if x is not a resting state complex, there must be at least one reaction in this process. The transition 
begins with this initial reaction = ({|a;|},F); Y may have multiple products, each of which decays independently 
to some complex or set of complexes in B. 

Realize that, for some reaction = (Ai_i, Af), by applying Eq.[^we recognize that if a fate F is reachable from 
Ai, then it is reachable from Ai-i. That is, for some fate F, F € F(Ai) F G F(Ai_i). This means that, 

for some prior reaction ri_i = {Ai- 2 , A'f) such that Ai C —that is, a reaction that produces the reactant of 
n—F G K(ri) F G M(rj_i). 

Next, we note that the set of products B of the transition must represent some fate; that is, B ^ F. Since 

B consists exclusively of resting states, F = {|S(&) : b G B'^. Multiple reactions ri, r 2 ,... may have produced the 
complexes in B, let us denote this set Rb- Rb = {fi = {Ai, Bi) G : Si C B}; i? is therefore the sum of the 

products of these reactions: B = =(A b )^Rb Because B F and Eq. j^includes all possible sums oflR(S), 

this means that if we choose fates Fi G F(ri) for each of those reactions, there exists some set {Fi, ^ 2 ,..., F^} such 
that Fi + F 2 + ... + F^ = F. 

Consider one of the reactions ri = {Ai,Bi) G Rbi that produces complex(es) in B. Each fate F' G K( 7 ’i) 
is also a fate of any reaction that produces Ai. This means that, for the particular fate Fi G {Fi, F 2 ,..., F^} 
satisfying jyjLi must also be a fate of any reaction that produces Ai. By induction, we can work backwards from 
Vi all the way to the initial reaction r°, and recognize that Fi C F° for some F° G K(r°). The same is true for all 
reactions ri G Rb- Because the recursive case of Eq.J^sums over all combinations of fates for all such pathways, the 
Ft + F 2 + ... + Fm = F must be a member of K(r*^ and therefore a member of F(a;). □ 

Theorem 2. (Detailed reactions map to condensed reactions) For every detailed resting state transition Ta^b, there 
exists a condensed reaction r = {A, B) such that A represents A and B represents B. 

Proof. Since Ta^b is a transition between two sets (A and B) of detailed resting state complexes, the transition 
consists of two steps: hrst, a bimolecular reaction r = {A, A') converts A to A'\ second, a series of unimolecular 
reactions convert the complexes in A' to B. The algorithm generates one or more condensed reactions for each detailed 
bimolecular reaction. Specihcally, the algorithm generates one condensed reaction for each combination of fates of 
the products in A'. That is, each of the condensed reactions is generated from one element in K(r) = 

By Lemma|^ for each product a' G Jf, F(a') corresponds to the set of possible transitions from a! that result in some 
resting state. Therefore we can choose any possible fast transition between Ta'^b, and it will correspond to some 
element of K(r)—and therefore to a condensed reaction f = {A, B). □ 

Intuitively, these two theorems mean that the condensed reaction network effectively models the detailed reaction 
network, at least in terms of transitions between resting states. The hrst theorem shows that a condensed reaction 
must be mapped to a suitable sequence of reactions in the detailed reaction network. The second theorem shows 
the converse—that any process in the detailed reaction network is represented by the condensed reactions. Having 
proved these theorems, we propose the following corollaries that extend this reasoning from individual (detailed and 
condensed) reactions to sequences of condensed and detailed reactions. We omit the proofs. 

Corollary 1. For any sequence of condensed reactions starting in some initial state A and ending in some final state 
B, and for any A ^ A and for any B ^ B, there exists a sequence of detailed reactions starting in A and ending in 
B. 


Proof omitted. 

Corollary 2. Conversely, for any sequence of detailed reactions starting in some multiset of resting state complexes 
A and ending in some multiset of resting state complexes B, there exists a sequence of condensed reactions starting in 
A and ending in B such that A A and B ^ B. 

Proof omitted. 

C Approximate detailed reaction kinetics 

(2,1) Binding — fc = 1 x 10® s“^, according to experimental evidence 1^1^ . 
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(1,1) Binding — The energetics of 1,1 binding depend strongly on whether the reaction forms an entropically unfa¬ 
vorable “internal loop” or “bulge.” We therefore provide three separate rate constant formulas for these different 
conditions. Zipping is binding between two adjacent domains (e.g. between dij and dij+i). Hairpin closing is 
where two non-adjacent domains bind to form a hairpin. Bulge closing is where two non-adjacent domains bind 
to form a more complex internal loop. 

Zipping — k = 10®/£ s“^, where I is the le ngth of the domain. Estimates range from 10“^ s to 10“® s for the 
zipping time of a single base pair. 

Hairpin closing — k = (2.54 x 10®)(f+5)“® s“^, where £ is the length of the hairpin. Bonnet et al. measured 
various hairpin closing rates and determine empirically that the length dependence is approximately given 
by 

k = a{£ + 5)-'^ 

where the exponent c depends on the temperature and the parameter a is to be htted to the dataEHl, For 
25.7 °C, the authors report c = 3 provides the best ht. We estimated a by a linear ht of {£ + 5)“® to 
experimentally measured values of k, using the following data from Fig. 7 of EH; 


£ 

(f+ 5)-® 

k 

30 

(30 + 5)-® 

5000 

21 

(21 + 5)-® 

10000 

16 

(16 + 5)-® 

20000 

12 

(12 + 5)-® 

50000 


Bulge closing — k = (2.54 x 10®)(.(' + 5)“® s“^, where £! = |y| + |t(;| + 5 approximates the size of the bulge. 
We used the same expression as for hairpins, but must account for the additional length from the region of 
unpaired bases in the bulge. Consider binding between two strands, as in Fig. S2 Fet £' — \y\ + |?ii| +5, 
such that the rate constant is given by fc = a{£' + 5)“®. 



Figure S2: Bulge formation 

Multiloop closing — k = (2.54 x 10®)(£' + 5)“® s“^, where £' = YlT=i Mil + 5(rn— 1) approximates the size 
of the multiloop containing domains di,d 2 , ■ ■ - dm- We used a similar expression as for bulges, but must 
generalize to account for the presence of m stems and loop segments within the multiloop. We therefore 
treat the multiloop size as the length of each domain, plus 5 nucleotides for each stem. 

Opening — k = 10®“^'^^^. Note that 2,1 binding is the reverse reaction of 1,2 opening. Experimental evidence 
suggests this ratio depends exponentially on the length £ of the binding/opening domain ESJ, and therefore: 

^AG°/RT _ ^reverse _ 1,2 Opening rate _ ^Q-ae jyj 
^forward 2,1 binding rate 

where AG° is the Gibbs free energy at standard conditions, T is the temperature in Kelvin, R is the ideal gas 
constant, and a is an unknown constant. Given the binding rate constant fcforward = 1 x 10® s“^, this means 

that the opening rate constant ^reverse = 10*^®““^^ s“^. We know that the typical energy of a single base stack is 
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— 1.7 kcalmol ^ at T = 298 K (25 °C) in a typical salt buffer®. We can use this information to solve for the 
constant a: 

^AG°/RT ^ ^ ^ (_i^rinio)a£ 

AG° =-1.7£ ^ -1.7£= (-i?T In 10)af 
=4> a = 1.7/(i?rinl0) « 1.24 
for i?= 1.99 X 10-3 kcalK-i mol”!, r= 298 K. 

3-way branch migration — We can consider the 3-way branch migration process as consisting of an initiation step 
that takes some time a s and a series of bra nch m igration steps; the time for branch migration steps varies 
quadratically with the length i of the domain®®, with some timescale b s each. The expected time {t) for 
3-way branch migration of ^ bases, including an initiation time b, is 

(t) = a + bi^. 

Imperfectly approximating the completion time of branch migration as a Poisson rate with the same expected 
time, the rate constant k is therefore 


(t) a + bP 

where the values of a and b must be determined experimentally, and represent the expected time for initiation and 
individual branch migration steps, respectively ®. We assume that the values are different for branch migration 
between adjacent domains (dij and dij+i) and for remote toehold-mediated branch migration®, where the 
domains are non-adjacent. 

Adjacent — k = [(2.8 x IQ-^) -p (0.1 x IQ-^ )P'\ ^ s Srinivas et al. measured a = 2.8 x 10 ^ s and 
6 = 0.1 X 10-3 s®. 

Remote toehold-mediated — k = [p(f)(2.8 x 10-3) -i- (0.1 x ^ s-^ We assume the same step 

rate b as for adjacent branch migration above, but assume that the initiation time a is a factor of p{£') slower 
for remote toehold-mediated branch migration than for adjacent branch migration. We define p{£) = 
a/fcmuitiioop closing (-^0’ where fcmuitiioop dosing (-^0 Calculated by the rate constant formula given above for 
multiloop closing over a bulge length £', and a is a constant for remote toeholds, fitted to the experimental 
data. 

4-way branch migration — k = s-^. We use a similar expression to that for 3-way branch migration. Dabby 

measured a = 77 s and b — Is®. 


D Nucleic acid secondary structure and pseudoknots 

Nucleic acid structures can be divide d int o two classes; those with base pairs that can be nested into a tree-like stru cture 
are called “non-pseudoknotted” (Fig. S3 i, while those that do not obey this nesting property are “pseudoknotted” ®® 

(Fig. I 


There are vastly more possible pseudoknotted structures than non-pseudoknotted structures, so allowing 
enumeration of pseudoknotted intermediates greatly increases the possible size of the generated reaction network. It 
would be attractive to enumerate only a subset of pseudoknotted secondary structures, but a poor understanding of 
the e nergetic s of pseudoknotted structures makes it very hard to determine which pseudoknots to allow and which to 
omit 153154|551 Purther, reactions (such as three-way branch migration) that are always plausible for non-pseudoknotted 
structures can yield topologically-impossible structures for pseudoknotted intermediates. For the sake of simplicity 
and efficiency we will therefore restrict our attention to non-pseudoknotted intermediates, for the time being. 

E Limitations 


As mentioned in the main text, there are two major limitations to the enumerator, as presented here. 

The first limitation precludes several behaviors, such as partial binding between similar domains with small num¬ 
bers of nucleotide mismatches (a concept which has been exploited to precisely control the thermodynamics and 
therefore specificity of DNA hybridization reactions®), as well as partial binding at the edges between two adjacent 
domains. 

The second limitation precludes consideration of the wide range of structures that contain pseudoknots. The dis¬ 
tinction between pseudoknotted and non-pseudoknotted complexes is demonstrated in Fig. [^Essentially, a pseudo¬ 
knotted complex is one which contains two intercalated stem-loop structures. Consider a complex with N nucleotides, 
each numbered 1, 2,..., Formally, this is non-pseudoknotted if, for every two base pairs i,j and k, I (where j, k, I 
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Figure S3: Non-pseudoknoted secondary structures. Our enumerator is capable of handling a wide range of non- 
pseudoknotted secondary structures, depicted above. 

are the numerical indices of four nucleotides in a complex) where i < j, k < I, and i < k, then either i < k < I < j 
(that is, the pair fc, (is nested within the pair i, j) or i < j < k < I (that is, the pairs i,j and k, I occur sequentially). 
Pseudoknotted structures do not obey this nesting property. This can be easily seen by drawing the DNA backbone 
of a pseudoknotted structure around a circle, and drawing base pairs as chords of the circle; if these chords cross, a 
complex is pseudoknotted.. 

It should be apparent that there are combinatorially more pseudoknotted structures than non-pseudoknotted struc¬ 
tures. The presence of pseudoknots has important implications for the energetics of a structure—notably, many pa¬ 
rameters of the energetic models that incorporate pseudoknots have not been measured and as a consequence it is 
difficult to accurately calculate the free energy and therefore estimate the stability of pseudoknotted structures. Addi¬ 
tionally, certain reactions have non-intuitive behavior; for instance, consider a 1-1 binding reaction between adjacent, 
complementary domains. For non-pseudoknotted structures, this reaction is always permissible. For pseudoknot¬ 
ted complexes, it is easy to consider scenarios where a naive branch migration would lead to physically implausible 
structures (see Fig.|S4|i. For simplicity and efficiency, we therefore do not consider pseudoknotted complexes. 

F Implementation 

We have implemented the reaction enumeration algorithm, as well as the algorithm for condensing reactions, in a 
command-line utility written in Python. The enumerator allows input to be specified using the Pepper Intermediate 
Language (PIL)—a flexible, text-based language for describing DNA strand displacement systems —as well as a 
simple, JSON-based interchange format. The enumerator can produce both full and condensed reaction spaces for a 
set of starting complexes. It can also write generated systems to several output formats: 

Pepper Intermediate Language (PIL) — Output format resembles input format 1^, but additional enumerated com¬ 
plexes and reactions are added. 
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a) 




1 


b) 



1 



i < j < k < I 



i < j, k < I 
i < k<j <I 



Figure S4; Pseudoknotted vs. non-pseudoknotted structures, (a) Structure containing no pseudoknots; all base pairs 
are either nested (i < k < I < j), or adjacent (i < j < k < 1). (b) Structure containing a pseudoknot; i < j and 
k < I, but i < k < j < I, so the nesting property is not satisfied. This can be seen geometrically in the circle diagram, 
(c) For non-pseudoknotted complexes, 1-1 binding is always permissible and yields plausible structures, (d) For a 
pseudoknotted complex such as this “kissing-loop” motif, binding between adjacent domains may produce unrealistic 
structures (e.g. binding between adjacent domains in the kissing-loop duplex would likely require partial unwinding of 
the duplexes). Lack of a coherent and efficient energetic model for structures with pseudoknots makes the plausibility 
of these structures difficult to evaluate. 
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Systems Biology Markup Language (SBML) — Standard format used by many modeling packages in systems 
and synthetic biology. Allows models to be transferi'ed to numerous simulation and analysis utilities. 

Chemical Reaction Network (CRN) — Simple plain text format listing each reaction on a single line. In this format, 
some reaction A + i?—>• C + would be written A + B -> C + D. 

ENJS — JSON-based output format; ENJS can be visualized interactively using DyNAMiC Workbench—this visu¬ 
alization is discussed in Chapter 2, and examples are shown in Chapter 6 of ref. 

G Supplemental Pseudocode 


Algorithm SI Bind 1-1 move type 

1 

function BINDI 1(c : Complex) 


2 

i? ^ } 

> Initialize empty set of reactions 

3 

c= {S,T),S = {si,S 2 ,---S|s|} 


4 

for all i e {1 ... S' } do 

\> For each strand 

5 

Si = {z,D) 


6 

for all j s {1... ZJ } do 

> For each domain 

7 

if Tij = 0 then 

> If domain dj is unpaired 

8 

i' ^ i 


9 

f ^ J + 1 


10 

while i' < S and ^ {i,j) do 

se = iz\D'),D' = {d[,d'^,...} 

[> Iterate over all domains with higher indices 

11 


12 

if / > \D'\ then 

> If at the end of a strand 

13 


> Move to hrst domain on next strand 

14 

else if Til ji / 0 then 

> If domain is paired 

15 

(*',/) ^ T^i.i 



> Skip over the internal loop between i, j and i', / so we don’t make a pseudoknot 

16 

else if dj = d'j, then 

> If dj is complementary to d'j 

17 

T' 

> Make a new structure 

18 


[> Where domains {i,j) and {i',j') are paired 

19 

c' ^ {S,T') 


20 


> Make a new reaction that yields this complex 

21 

end if 


22 

3' ^ / + 1 

> Go to next domain 

23 

end while 


24 

end if 


25 

end for 


26 

end for 


27 

return R 


28 

end function 
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Algorithm S2 Bind 2-1 move type 

1 

function B1ND21 (c : Complex, d : Complex) 


2 

i? 3— { } 

> Initialize empty set of reactions 

3 

c={S,nc' = {S',T') 


4 

for all d such that 3{z, D) £ S, d G D do 

for all d' such that 3{z', D') £ S', d' £ D' do 

> For each domain on each strand in c 

5 

t> For each domain on each strand in c' 

6 

if d' = d* then 

[> If d and d' are complementary 

7 

S" ^SU S' 


8 

T" £- combine structures, pair d and d' 


9 

c" ^{S",T") 

> Make new complex that joins c and c' 

10 

R^Rumc,c'l^c"ff)} 

> Make reaction that yields this complex 

11 

end if 


12 

end for 


13 

end for 


14 

return R 


15 

end function 
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Algorithm S3 Opening move type 


1 

function OPEN(c : Complex) 


2 

i? 4- { } 

> Initialize empty set of reactions 

3 

c= {S,T),S = {si,S 2 ,---S|s|} 


4 

for all i e {1 ... S' } do 

t> For each strand 

5 

Si = {z, D), D = {di,d 2 ,... d|D| } 


6 

for all j e {1... \D\} do 

i> For each domain 

7 

if Tij 7 ^ 0 and Tij > {i, j) then 

> If domain dj is paired to a later domain 
> Find the beginning and end of the helix where dj is bound 

8 

A = (^aJa) ^ (i,j) 

> Beginning of the helix on the “top” strand 

9 

B = {iB,jB) ^ Tij 

> Beginning of the helix on the “bottom” strand 

10 

^ -A 

i> End of the helix on the “top” strand 

11 

B' = (*B', js') ^ B 

> End of the helix on the “bottom” strand 

12 

i 4— \dj\ 

> Keep track of helix length in nucleotides 

13 


> Eind the end of the helix 

14 

while jj < and jg > 0 and A' 

paired 

= Tb' do > While neither strand is broken and domains are 

15 

Ja' ^ Ja' + 1 

i> Move right 

16 

JB' ^ 3b' - 1 


17 

£ 4- £ + \dj^, 


18 

end while 


19 


> Eind the beginning of the helix 

20 

while jA > 0 and jb < sib and A 

paired 

= Tb do > While neither strand is broken and domains are 

21 

3A ^ jA - 1 

t> Move left 

22 

3B ^ js + 1 


23 

£ 4- £ + \djj^\ 


24 

end while 


25 

if £ < L then 

> If the total length of the helix is less than the threshold 

26 

T' 


27 

for all / G {jA---jA'}do 
Update T' to break pair i, / 


28 


29 

end for 


30 

if S, T' is not connected then 


31 

c', c" 4— split T' and S into two connected complexes 

32 

R^RUimAc'A'l)} 

> Make reaction that yields these complexes 

33 

else 


34 

c' ^ (S,T') 

t> Make new complex 

35 

R^RUiiMAcl)} 

i> Make reaction that yields this complex 

36 

end if 


37 

end if 


38 

end if 


39 

end for 


40 

end for 


41 

return R 


42 

end function 
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Algorithm S4 3-way branch migration move type 

1 

function 3way(c : Complex) 


2 

i? 4- { } 

> Initialize empty set of reactions 

3 

c= {S,T),S = {si,S 2 ,---S|s|} 


4 

for all i e {1... S' } do 

[> For each strand 

5 

Si = {z, D), D = {di,d 2 ,... d|D|} 


6 

for all j e {1 ... \D\} do 

i> For each domain 

7 


[> Look to the left 

8 

if Tij ^ 0 and j ^ \D\ and = 0 then 


9 

(*',/) ^ ihj + 1) 

> (z, j + 1) will be the invading domain 

10 

repeat > Search left to find 

a bound domain (z', /) that can be displaced 

11 

/ ^ ~ 1 


12 

if Ti' j' 0 and di’ ^ can pair djj^i then 


13 

r^T where = (*, j + 1), TC+i = ( 


14 

if S, T' is not connected then 


15 

c', c" 4— split T' and S into two connected complexes 

16 


> Make reaction that yields these complexes 

17 

else 


18 

c' ^ {s,r) 

[> Make new complex 

19 

R^RUimAcl)} 

> Make reaction that yields this complex 

20 

end if 


21 

end if 


22 

(*^/) ^ 

> Follow the structure 

23 

until / = -1 or = 0 or {i',f) = {i,j) 


24 

end if 


25 


[> Look to the right 

26 

if Tij / 0 and j / 0 and / = 0 then 


27 

(*',/) ^ ifj - 1) 

> (z, j — 1) will be the invading domain 

28 

repeat > Search right to find 

a bound domain (z', /) that can be displaced 

29 

/ ^ + 1 


30 

if /' j' / 0 and df j' can pair /_i then 


31 

r^T where = {i,j - 1), = ( 

i',f) 

32 

if S, T' is not connected then 


33 

c', c" 4— split T' and S into two connected complexes 

34 


> Make reaction that yields these complexes 

35 

else 


36 

c' ^ {s,r) 

[> Make new complex 

37 

R^RUimAcl)} 

> Make reaction that yields this complex 

38 

end if 


39 

end if 


40 

until / = \D\or = {i,j) 


41 

end if 


42 

end for 


43 

end for 


44 

return R 


45 

end function 



26 




Algorithm S5 4-way branch migration move type 

1 

function 4way(c : Complex) 


2 

i? 4- { } 

> Initialize empty set of reactions 

3 

c= {S,T),S = {si,S 2 ,---S|s|} 


4 

for all i e {1... S' } do 

t> For each strand 

5 

Si = {z, D), D = {di,d 2 ,... d|D|} 


6 

for all j e {1... \D\} do 

i> For each domain 

7 

if Tij 7 ^ 0 and j < \D\ then 

> domain dj must be bound, not be at the end of a strand 

8 

A ^ {i,j + 1) 

> Displacing domain 

9 


> Displaced domain 

10 

if i? = 0 then Continue 


11 

end if 


12 

C* = (icJc) ^ Tij 


13 

C ^ {ic,jc - 1) 

> Template domain (replaces B, binds A) 

14 

if jc < 1 then Continue 


15 

end if 


16 

D^Tc 

> (replaces A, binds B) 

17 

a D = 0 then Continue 


18 

end if 


19 

if i? 7 ^ C and dA = d% and ds = d}^ then t> If this is a four-way branch migration 

20 

T' C, T'c ^ A; 


21 

if S, T' is not connected then 

22 

c', c" 4— split T' and S into two connected complexes 

23 


> Make reaction that yields these complexes 

24 

else 


25 

c' ^ (S,T') 

[> Make new complex 

26 

R^RUiiMAcl)} 

i> Make reaction that yields this complex 

27 

end if 


28 

end if 


29 

end if 


30 

end for 


31 

end for 


32 

return R 


33 

end function 
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Algorithm S6 Reaction enumeration 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 
23 


> Complexes 
> Reactions 
> Resting states 

\> Enumerate fast reactions from A 

> Find fast reactions from b 


procedure Enumerate(A : {Complex}) 

£ ^ ^ ;T ^ {} 

TZ^ {} 

Q^U 

B^A 

while B ^ {} do 

b ^ pop(S) 

{S', T', Q', R') ^ enumerateNeighborhood(6) 

S ^ SU S'-,T ^Tur-,TZ^nu R'-, QUQ' 

end while 

while 5 ^ {} do > Enumerate slow reactions between resting state complexes 

S -Ir- pOp(iS) 

{R', B') ^ GETSlowReactions(s, 5 U f) > Find slow reactions from s 

S S Li (sj > s moves to £ once slow reactions are enumerated 

TZ ^ TZU R' t> Store new reactions 

B ^ B U B' \ {£ L) S Li T) > Store new complexes 

while B ^ {] do > Enumerate fast reactions from B 

b -L- pop(;B) 

{S', T', Q', R') neighborhood(6) > Find fast reactions from b 

S ^ SU S'-,T ^ruT'-TZ^TZLiR'-,Q^ QLiQ' 

end while 
end while 
end procedure 


24: procedure ENUMERATENeighb0RH00D(c : Complex) t > Calculates fast reactions from c, sorts complexes into 
resting state complexes/transient complexes 

25: J- = jc} [> Complexes from fast reactions in neighborhood 

2&. N = {} t> Complexes in neighborhood 

H-. TZm = {} > (Fast) Reactions in neighborhood 

28: while B ^ {] do > Enumerate fast reactions from each complex in F 

29: / ^ pop(F) 

30: {R', F') ^ GETFastReactions(/) [> Find fast reactions from / 

31: F ^ F\JF'\N 

32: N ^ N\JF' 

33: TZn TZn U R' 

34: end while 

35: Apply Tarjan’s algorithm EH to find strongly-connected components of the directed graph G = {Af, TZn) 

36: Q' ^ {strongly-connected components of G with no outgoing fast reactions} > Resting states are SCCs of G 

37: S" •(— {s : s G <7 for any q G Q'} [> resting state complexes are in a resting state 

38: T' ^ Af \ S' t> Transient complexes are everything else 

39: return {S', T', Q', TZn) 

40: end procedure 


41: procedure getFastReactions(c : Complex) 

> Calculates all fast (unimolecular) reactions that consume c 
42: R ^ fast reactions consuming c, G union of products of reactions in R 

43: return R, C 

44: end procedure 

45: procedure GETSlowReactions(c : Complex, S : {Complex}) 

> Calculates all slow (bimolecular) reactions that consume c and an element of S 
46: R ^ slow reactions consuming c and s G S, G <— union of products of reactions in R 

47: return R, C 

48: end procedure 
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Algorithm S7 Condensing Reactions 

1: Fa; •(— undefined V complexes x > The map F : Complex —> {Fate} is global and begins empty 

2: S' •(— { } > The map S : Complex —>■ {Complex} is global and begins empty 


3: 

4: 

5: 

6 : 

7: 

8 : 

9: 

10 : 

11 : 

12 : 

13: 

14: 


procedure C0NDENSE(C : {Complex}, TZ : {Reaction}) > Computes the fates for each complex, then 

generates a set of condensed reactions 

TZg {r : r G TZ,r is slow} [> Slow reactions 

TZf ^ Tl\ TZs > Fast reactions 


^ {r G TZf : a{r) = (1,1)} > Fast (1,1) reactions 

Use Tarjan’s algorithmic to compute the set of strongly-connected components 
from the graph F = (C, 

S ^ the set of strongly-connected components of F 

S(a;) •(— the strongly-connected component containing complex x,\/x G C 

for all Ce G S do > For each SCC of F 

COMPUTEFATES(C'c, TZf) 

end for 

return condenseReactions(7?.s) 

end procedure 


15: 

16: 

17: 

18: 

19: 

20 : 

21 : 

22 : 

23: 

24: 

25: 

26: 


27 

28 
29 


procedure COMPUTeFates(C : {Complex}, : {Reaction}) 


^ {r = (A, 5) G : A C C, R \ C ^ 0} 
if |i?o| = 0 then 

F(c) ^ {C}VcGC 

else 


> Outgoing fast reactions 
> If no outgoing fast reactions 
> F (c) is the resting state C containing the complex c 
[> If there are outgoing fast reactions 


for all c G C do 

i?U,n) ^ {r G : a(r) = (l,n)} 

Po t— Ur=(A,B)Gfl}l.l) ^ 

for all X G Po do 

If F(a:) is undefined, COMPUTEFATES(S(a;), TZf) 

end for 

®'(c) t— [J I F(c) are the possible fates from outgoing reactions 

r=(A,B)Gflo \beB / 


end for 
end if 

end procedure 


30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 


procedure C0NDENSEReacti0NS(7?,s : {Reaction}) > Condensed reaction space from the set of slow reactions 
TZ ■>^ {} > Condensed reactions 

for all s = (A, h ) G TZg do 

for all R' G F(6) do 
(A',R') 

TZ^TZyjr' 

end for 
end for 
return TZ 
end procedure 


\> Fates of reactants are all trivial 
> For each fate of b 
\> Generate new reaction 
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